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Abstract 

We address the problem of contour detection via per-pixel classifications of edge 
point. To facilitate the process, the proposed approach leverages with DenseNet, 
an efficient implementation of multiscale convolutional neural networks (CNNs), 
to exUact an informative feature vector for each pixel and uses an SVM classifier 
to accomplish contour detection. In the experiment of contour detection, we look 
into the effectiveness of combining per-pixel features from different CNN layers 
and verify their performance on BSDS500. 


1 Introduction 

We consider deep nets to consttuct a per-pixel feature learner for contour detection. As the 
task is essentially a classification problem, we adopt deep convolutional neural networks (CNNs) 
to establish a discriminative approach. However, one subtle deviation from typical applications 
of CNNs should be emphasized. In our method, we intend to use the CNN architecture, e.g., 
AlexNet (iRrizhevskv et al.Ll2012l) . to generate features for each image pixel, not just a single feature 
vector for the whole input image. Such a distinction would call for a different perspe ctive of parame¬ 
ter fine-tuning so that a pre-Uained per-image CNN on ImageNet dDeng et all 120091 can be adapted 
into a new model for per-pixel edge classifications. To investigate the property of the features from 
different convolutional layers, we carry out a number of experiments to evaluate their effectiveness 
in performing contour detection on the benchmark BSDS Segmentation dataset dMartin et al.tl200lb . 

2 Per-Pixel CNN Features 

Learning features by employing a deep architecture of neural net has been shown to be effective, 
but most of the existing techniques focus on yielding a feature vector for an input image (or image 
patch). Such a design may not be appropriate for vision applications that require investigating image 
characteristics in pixel level. For contour detection, the central task is to decide whether an under¬ 
lying pixel is an edge point or not. Thus, it would be convenient that the deep network could yield 
per-p ixel features. To this end, we extract per-pixel CNN features in AlexNet (IKrizhevskv et all 
120121’) using DenseNet dlandola et al.U2014l) . and pixel-wise concatenate them to feed into a support 
vector machine (SVM) classifier. 

Our implementation uses DenseNet for CNN feature extraction owing to its efficiency, flexibility, 
and availability. DenseNet is an open source system that computes dense and multiscale features 
from the convolutional layers of a Caffe CNN based object classifier. The process of feature ex¬ 
traction proceeds as follows. Given an input image, DenseNet computes its multiscale versions and 
stitches them to a large plane. After processing the whole plane by CNNs, DenseNet would unstitch 
the descriptor planes and then obtain multiresolution CNN descriptors. 

The dimensions of convolutional features are ratios of the image size, e.g., one-fourth for Convl, 
and one-eighth for Conv2. We rescale feature maps of all the convolutional layers to the image size. 
That is, there is a feature vector in every pixel. The dimension of the resulting feature vector is 
1376 x 1, which is concatenated by Convl (96 x 1), Conv2 (256 x 1), Conv3 (384 x 1), Conv4 
(384 x 1), and Conv5 (256 x 1). 
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Table 1: Contour detection results of using CNN features from different layers. 



Convl 

Conv2 

Conv3 

Conv4 

Conv5 

Convl-5 

ODS 

.627 

.699 

.655 

.654 

.604 

.741 

OIS 

.660 

.718 

.670 

.667 

.620 

.759 

AP 

.625 

.712 

.619 

.615 

.546 

.757 


For classification, we use the combined per-pixel feature vectors to learn a binary linear SVM. It is 
worth noting that, in our multiscale setting, we train the SVM based on only the original resolution. 
In test time, we classify test images using both the original and the double resolutions. We average 
the two resulting edge maps for the final output of contour detection. 

3 Experimental Results 

We test our method on the Berkeley Segmentation Dataset and Benchmark (BSDS500) 
dMartin et all 120011 : lArbelaez et all 1201 l l) . To better assess the effects of the features of different 
layers, we report their respective performance of contour detection. The BSDS500 dataset con¬ 
tains 200 training, 100 validation, and 200 testing images. Boundaries in each image are labeled 
by several workers and are averaged to form the ground Uuth. The accuracy of contour detection 
is evaluated by three measures: the best F-measure on the dataset for a fixed threshold (ODS), the 
aggregate F-measure on the dataset for the best threshold in each image (OIS), and the average pre¬ 
cision (AP) on the full recall range ( lArbelaez et all 1201 lb . Prior to evaluation, we app ly a standard 
non-maximal suppression technique to edge maps to obtain thinned edges ( Cann y, (19861) . 

In Table Q] we see that features in Conv2 contribute the most, and then Conv3 and Conv4. These 
suggest that low- to mid-level features are most useful for contour detection, while the lowest- 
and higher-level features provide additional boost. Although features in Convl and Conv5 are less 
effective when employed alone, we achieve the best results by combining all five streams. It indicates 
that the local edge information in low-level features and the object contour information in higher- 
level features are both necessary for achieving high performance in contour detection tasks. 
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